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Team Performance with Test Scores 
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Team performance is a ubiquitous area of inquiry in the social sciences, and it motivates the problem of team 
selection — choosing the members of a team for maximum performance. Influential work of Hong and Page 
has argued that testing individuals in isolation and then assembling the highest-scoring ones into a team is 
not an effective method for team selection. For a broad class of performance measures, based on the expected 
maximum of random variables representing individual candidates, we show that tests directly measuring 
individual performance are indeed ineffective, but that a more subtle family of tests used in isolation can 
provide a constant-factor approximation for team performance. These new tests measure the “potential” of 
individuals, in a precise sense, rather than performance; to our knowledge they represent the first time 
that individual tests have been shown to produce near-optimal teams for a non-trivial team performance 
measure. We also show families of subdmodular and supermodular team performance functions for which 
no test applied to individuals can produce near-optimal teams, and discuss implications for submodular 
maximization via hill-climbing. 


1. INTRODUCTION 

The performance of teams in solving problems has been a subje ct of considerable 
interest in multiple areas of the mathemati cal social sciences MGullv et al.l I2002I: 
iKozlowski and IlgenIl200fil: IWuchtv et al.ll20()^ . The ways in which groups of people 
come together and accomplish tasks is an important issue in theories of organiza¬ 
tions, innovation, and other collective phenomena, and the recent growth of interest in 
crowdwork has brought these issues into focus for on-line platforms as well. 

In formal models of team performance, a central issue is the problem of team se¬ 
lection. Suppose there is a task to be accomplished and we can assemble a team to 
collectively work on this task, drawing team members from a large set U oi n can¬ 
didates. (We can think of U as the job applicants for this task.) A team can be any 
subset T <Z U, and its performance in collectively working on the task is given by a 
set function g{T). The central optimization problem is therefore a kind of set function 
maximization: given a target size k < n for the team, we would like to find a set T of 
cardinality k for which g{T) is as large as possible. 

The generality of this framework has meant that it can be used to reason about a 
wide range of settings in which we hire workers, solicit advice from a committee, run a 
crowdsourced contest, admit college applicants, and many other activities — all cases 
where we have an objective function (the outcome of the work performed, the quality of 
the insights obtained, or reputation of the group that is assembled) that is a function 
of the set of people we bring together. 
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Models of Team Performance. Different models of team performance can be in¬ 
terpreted as positing different forms for the structure of the set function g{-). Some of 
the most prominent have been the following. 

— Cumulative effects. Arguably the simplest team performance function is a linear one: 
each individual can produce work at a certain volume, and the team’s performance is 
simply the sum of these individual outputs. Formally, we assume that each individual 
i & U has a weight Wi, and then g{T) = 

— Contests. Much work has focused on models of team performance in which the “team” 
is highly decoupled: members attempt the task independently, and the quality of the 
outcome is the maximum quality produced by any member. Such formalisms arise 
in the study of contest-like processes, where many competitors independently con¬ 
tribute proposed solu tions, and a coordinator selects the best one (or pe rhaps the h 
best for some h < k) llJennesen and Lakhanill2010l: iLakhani et alJl2013[] . Note how¬ 
ever that this objective function is applicable more generally to any setting with a 
“contest structure,” even potentially inside a single organization, where proposed so¬ 
lutions are generated independently and the outcome is judged by the quality of the 
best one (or best few). 

— Complementarity. Related to contests are models in which each team member has a 
set of “perspectives,” and the quality of the team’s performance grow s with the num¬ 
ber of distinct perspectives that they are collectively able to provide IlHong and Pagel 
120041: iMarcolino et alJl2()13l] . 

— Synergy. In a different direction, research has also considered models of team perfor¬ 
mance in which interaction is important, using objective functions with terms that 
gener ate value from pairwise interaction between team members HBallester et al.l 

Ifoo^ . 

These settings are not just different in their motivation; they rely on functions gf) 
with genuinely different combinatorial properties. In particular, in the language of set 
functions, the first class of instances is based on modular (i.e. linear) functions, the 
second and third classes are based on suhmodular functions, and the fourth is based 
on supermodular functions. 

The second and third classes of functions — contests and complementarity — play 
a central role in Scott Page’s high ly influential line of work on the power of diversity 
in team performance llPagell200^ . The argument, in essence, is that a group with di¬ 
versity that is reflected in independent solutions or complementary perspectives can 
often outperform a group of high-achieving but like-minded members. 

Evaluating Team Members via Tests. A key issue that Page’s work brings to the 
fore is the question of tests and their effectiveness in identifying good team members 
llPagell2008l] . In most settings one can’t “preview” the behavior of a set of team members 
together, and so a fundamental approach to team forma tion is to give each candidate 
i G [/ a test, resulting in a test score f{i) UMilled 1200111 . It is natural to then select 
the k candidates with the highest test scores, resulting in a team T. We could think 
of the test score /(i) corresponding to the SAT or GRE score in the case of college 
or graduate school admissions, or corresponding to the quality of answers to a set of 
technical interview questions in a job interview. 

Should we expect that the k individuals who score highest on the test will indeed 
make the best team? In a simple enough setting, the answer is yes — for modular 
functions g(T) = enough to evaluate each candidate i in isolation, apply¬ 

ing the test f{i) = ^({i}) = Wi. Let us refer to /(i) = 5 ({i}) in general as the canonical 
test — we simply see how i would perform as a one-element set. For modular functions. 
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clearly the k candidates with the highest scores under the canonical test form the hest 
team. 

On the other hand, Hong and Page construct an example, based on complementarity, 
in which the k candidates who score highest on the canonical tes t perform significantly 
worse as a team than a set of k randomly selected candidates IlHong and PagelHoO-^ 
Their mathematical analysis has a natural interpretation with implications for hiring 
and admissions processes: the k candidates who score highest on the test are too sim¬ 
ilar to each other, and so with an objective function based on complementarity, they 
collectively represent many fewer perspectives than a random set of k candidates. 

Beyond these compelling examples, however, there is very little broader theoretical 
understanding of the power of tests in selecting teams. Thinking of tests as arbitrary 
functions of the candidates is not a perspective that has been present in this earlier 
work; a particularly unexplored issue is the fact that the failure of the canonical test 
doesn’t necessarily rule out the possibility that other tests might be effective in assem¬ 
bling teams. Does it ever help, in a formal sense, to evaluate a candidate using a mea¬ 
sure f{i) that is different from his or her actual individual performance at the task? In 
real settings, we see many cases where employers, search committees, or admissions 
committees evaluate applicants on their “potential” rather than on their demonstrated 
performance — is this simply a practice that has evolved for reasons of its own, or does 
it have a reflection in a formal model of team selection? Without a general formulation 
of tests as a means for evaluating team members, it is difficult to offer insights into 
these basic questions. 

The Present Work: Effective Tests for Team Selection. In this paper we ana¬ 
lyze the power of general tests in forming teams across a range of models. Our main 
result is the finding that for team performance measures that have a contest struc¬ 
ture, near-optimal teams can be selected by giving each candidate a test in isolation, 
and then ranking by test scores, but only using tests that are quite different from the 
canonical test. To our knowledge, this is the first result to establish that non-standard 
tests can yield good team performance in settings where the canonical test provably 
fails. 

In more detail, in a contest structure each candidate i €U has an associated discrete 
random variable Xi, with all random variables mutually independent, and the perfor¬ 
mance of a team T c {/ is the expected value of the random variable max^gT Xi. More 
generally, we may care about the top h values, for a parameter h < k,in which case the 
performance of T is the expected value of the sum of the h largest random variables in 
T: 


9{T) = E 


max 

'S'CT,|S|=^ 




The test that works well for these contest functions has a natural and appealing 
interpretation. Focusing on the general case with parameter h < k, we define the test 
score f{i) to be 


E 


max(X 


( 1 ) 


j^{k/h) ^ 


where xl^\x^'^\ ... represent k/h independent random variables all with the 

same distribution as X^. 

The fact that this test works for assembling near-optimal teams in our contest set¬ 
ting has a striking interpretation — it provides a formalization of the idea that we 
should indeed sometimes evaluate candidates on their potential, rather than their 
demonstrated performance. Indeed, ..., is precisely a measure 
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of potential, since instead of just evaluating i’s expected performance E [Xi], we’re in¬ 
stead asking, “If i were allowed to attempt the task k/h times independently, what 
would the hest-case outcome look like?” Like the argument of Hong and Page about 
diversity, this argument about potential has qualitative implications for evaluating 
candidates in certain settings — that we should think about upside potential using a 
thought experiment in which candidates are allowed multiple independent tries at a 
task. 

Following this result, we then prove a number of other theorems that help round out 
the picture of general tests and their power. We show that there are natural settings in 
which no test can yield near-optimal results for team selection — these include certain 
submodular functions capturing complementarity and certain supermodular functions 
representing synergy. Note that this is a much stronger statement than simply assert¬ 
ing the failure of the canonical test, since it says that no test can produce near-optimal 
teams. Finally, we identify some further respects in which team performance functions 
g{-) based on contest structures have tractable properties, in particular showing that 
for the special case in which the random variables corresponding to all the candidates 
are weighted Bernoulli variables, greedy hill-climbing on the value of g{-) in fact pro¬ 
duces an exactly optimal set of size k. 

2. TEAM SELECTION BY TEST SCORE 

Assume we are trying to assemble a team of a fixed size k. We demonstrate that for a 
class of natural team performance metrics, while the canonical test fails (Section 3), 
picking the top k candidates according to a different test score provides a constant- 
factor approximation to the optimal team choice. The result may be of interest in other 
contexts beyond team performance as well, since it builds on basic properties of the 
maxima over sets of random variables. 

Model. Let each candidate in our selection pool correspond to a nonnegative discrete 
random variable X, independent of other candidates. We denote the values X can 
take, in decreasing order of value, by (xi, ...,x„) with respective probability masses 
{pi, ...,Pn). Thus any one candidate stochastically contributes ideas or effort of utility 
Xi with probability pi. We consider the following team performance measures: 

Definition 2.1. For (nonnegative) random variables Ai,...,Afe, and for i < k, let 
Xk) d®^ote the largest random variable out of Xi, ..., Xk- Then for 1 < h < k, 
our performance measure is the expected sum of the h largest values among Ai,..., At: 

gh{Xi, ..., Afc) = E + ^(x[,...,Xi:) + + ^(xi,...,Xk)) 

For h = 1 this is just the expected maximum; team performance is given by the 
expected utility of the single best idea or effort. For each g^ we define a test fh to apply 
to candidates as follows: 

Definition 2.2. For a nonnegative discrete random variable A, and h < k, let 

A(A) =E(max(A«,...,A(^))) 

where A^®) denotes a copy of A. So fh{X) is the expected maximum of fe//i independent 
copies of A. (We assume throughout this section that h divides k for convenience. The 
constant-factor approximation result is not affected by this, though the bounds may 
change slightly.) 

The test provides a natural interpolation between h = k, where guiXi, ■■■,Xu) is 
simply the sum of the expected values h = 1, where g^(Ai,..., Afc) = 
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E(max(Xi, ...,Xk)). In the former, team performance is very well captured by a canon¬ 
ical test of individual performance — indeed, fh{X) = E(X) is a perfect test. However, 
E(X) performs very poorly as a test in the latter case. Indeed, in Section 3 we will 
show that choosing the k team members to have the k highest expectations performs 
as badly as the submodularity of E(max(-)) will allow: a factor of ^ away the optimal. 

Intuitively, we can think of the success of fh{X) for small h as coming from the way 
it captures the potential of X. Supported by the rest of the team, rare, very high utility 
contributions of X, insignificant in individual performance, may contribute substan¬ 
tially to expected team performance. As h decreases from /c to 1, we see an increase in 
emphasis on potential instead of solely expected individual performance. 


The top h/2k quantile. The expected maximum of several independent identical 
random variables has a strong connection with the upper quantile of the variable’s 
distribution. Note that when we evaluate X by computing fh{X), there is a significant, 
0(1) probabili ty tha t one of the k/h copies of X takes a value in its top h/2k quantile. 
(See Theorem 12. 161 for another test derived directly from this.) If we use A to denote 
the event that some random copy takes a value in the top h/2k quantile, we have 

So understanding this top quantile of X better will help us derive the desired guaran¬ 
tees. Motivated by this, we discuss defining the top h/2k quantile values of X. 

Observation 2.3. For a discrete random variable X as in Definition \2.1\ we can 
think of the underlying sample space being [0,1], so that for w G [0,1], 

{ xi if w > \ — Pi 

xi i/'l - ELiP* << l-Er=j-iPi 
0 i/’w < 1 - x;r=iPi 

In particular, this allows us to define the top values of X. 

Definition 2.4. For X as in Definition 12.11 with the underlying sample space [0,1] 
as in Observation |2]3l the event A that X takes values in its top h/2k quantile is 


The top values of X are then 


A = {uj : (jj > 1 



{xi : 1 < i < n,3u! € A, X(cu) = Xi} 

Similarly, we can define the tail values to be 


{xi : 1 < i < n,3u! G A'^,X{uj) = Xi} 

Note that the top values and tail values are usually not disjoint- for the boundary 
value xt, we may have to split {w : X{w) = xt} into A and A‘^. 


2.1. Preliminary Lemmas 

Using the framework just established, we prove some preliminary lemmas relating the 
top h/2k quantile to other natural functions of X. 

Notation. Unless otherwise specified, a random variable X is assumed to be dis¬ 
crete and nonnegative, taking values (in decreasing order of value) (cci,..., a;„), with 
associated probability masses (pi, ...,Pn)- It will be useful to define qi = Y^\^iPi- We 
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will also often use (xi, xt) to denote the top values of X, with the prohahility mass 
associated with xt split so that qt = -^ exactly. We now note the following 

Observation 2.5. With the notation as above, 

In the first lemma, we make precise the contribution of the top h/2k quantile to 

MX). 

Lemma 2.6. Let X be a random variable, with underlying sample space [0,1]. De¬ 
fine X' as 

[0 o/w 

Then 

MX')>MX) (i-^) 

Proof. First note that if B is the event that some Xh) in the k/h copies of X in 
fh{X) takes one of its top values, (xi,..., xt), then certainly 


MX\B) = E (max(X(i), ...,X('=/'‘))|b) > ^(X) 

(as we are conditioning on an event concentrated on the highest possible values). But 
the left hand side can be written out in full as 

(i_(i !„)./>) ((i - + - + (d - ‘'•-■I*''* - 0 -“^0 2 

But this is just 


Noting that 1 — (1 — qM^^ > (1 —L) gives the result. □ 

ye 


We have therefore shown that a transformation mapping X to X', non zero only on 
the top h/2k quantile of X, does not result in too large a loss in the value of fh{X). To 
use this result, we need a better understanding of properties of random variables with 
only values in the h/2k quantile - i.e. random variables with total positive probability 
mass < h/2k. We explore this in the following two lemmas. 


Lemma 2.7. For a > 1, the functions 


and 


are increasing for x € 


0 , 


2a 


(1 — x)“ — (1 — ax) 


Proof. Differentiating, and removing the positive factor of a, we have 

1- (l-x)“-i 
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which is > 0 for a; G [0,1] and 

(i-xr-4 

which achieves its minimum value at x = ^ but remains nonnegative for a > 1. □ 

Lemma 2.8. For a random variable X, with total probability mass < h/2k (i.e. 
Qn < h/2k), we have 

hUX) ^ 2hh{X) 

Proof. fh{X) can be written explicitly as 

(l - (1 - 91)'=/'“) XI + ((1 - 91)'=/'“ - (1 - 92)"/"“) ^2 + ... + ((1 - 9n-l)"/'‘ - (1 - 9n)"/'‘) 

Noting that g* < gi+i, a straightforward application of Lemma l2.7l gives 


^ < (1 - 9 .)'=/" - (1 - 


Substituting this into the expression for fh{X) gives 
kE{X) .A kpiXi ^ n ^ 

= A-)— 


kE{X) 


2h 


i=l 




□ 


Finally we prove two lemmas to upper bound the contribution of the top h/2k quan¬ 
tile events and the contribution of the tail values of X in terms of fh{X). 

Lemma 2 .9. For a random variable X, underlying sample space [0,1], let A be as 
in Definition \2.4\ Then 

E(X|71) < ^fh{X) 

Proof. Splittin g the the boundary value Xt if necessary, assume qt = ■^- But then 
for X' as in Lemma [2]6] 

h{X') = (l - (1 - 91)"/'“) X, + ... + ((1 - 9*-!)'=/'“ - (1 - 9t)"/") Xt < MX) 

As qt = we can use Lemma l2?71 (with a = k/h, qt G [0, k/2h],i < t) to get 

|-E(A') < MX') < MX) 

Also 

1E(A|A) = ^ j y = yE(A) 

Therefore, 

E(A|A) < 4^(A) 


□ 


Lemma 2.10. Let X have (xi, ...,Xt) as its top values, with qt = Then 


xi < 


MX) 
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for any I > t 

Proof. Note that 

Xt<h{X) 

The Lemma then follows hy noting that (l — (1 — > 1 — and that xi < xt for 

l>t. □ 


2.2. Main Result 

With our preliminary results derived, we seek to prove: 

Theorem 2.11. IfXi, ..., Xk are the top scorers in our test, and Yi,..., Yfc is the true 
optimal team with respect to the metric gh, then for constant (X < SOJ, 

9h{Xi, Lfe) < ..., Wfc) 

We will huild up to the proof of this theorem in a sequence of steps, hy first deriving 
upper and lower hounds for g in terms of /. That is, we will show that / does indeed 
capture the contrihution of a person X when evaluating the team with g. Putting to¬ 
gether the upper and lower hounds will give the result. 

The Upper Bound 

Theorem 2.12. Let Xi, ...,Xk be random variables with fh{Xi) < c. Then 

he 

9 h{Xi,Xk) < 2hc + - -j— 

Proof. Assume the underlying sample space is [0,1]^'. Let Sc [/c], and 
Bs = {uj G [0,1]’^ : Wi > 1 - ^ ^ iGS} 

i.e. the event that Xi takes values in its top h/2k quantile iff i £ S. For a sample point 
uj £ Bs, note that 

(^Xu...,Xk^ + , 1 

i€S 

Indeed, if the top h values are ..., Xn^,, with the first m, ni,..., n™ in S' then 

m 

i=l iGS 

The remaining random variables, ..., Xn^ take tail values (as in Definition l2.4D . 

so hy Lemma [2.101 


^ „ 
E < {h - m) ^ ^ ^ ^ _ j_ 

"s/e -v/e 

giving the inequality. Summing up over all w £ Bs, we get 

gH{{Xu...,Xk)lBs)<^(lBsJ2^^ 

\ ies 


■nBs) 


he 


^ Vi 
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But letting be the event that cji > 1 
linearity of expectation 



and using independence of the Xi and 

ies 


Using the bound in Lemma [2.91 this becomes 


E 



<P(Bs)|5|4c 


Finally, as P(Bs) = II^gs 



k-\S\ 


i.e. the number of Xi taking their top values follows a Binomial distribution, parame¬ 
ters (fc, ^). So, summing up over Bs for all S C [fc], we get 


gh{Xu...,Xk) < ^ (J ( ^ 


2=0 


1-A 

2k 


k—i 


z • 4c + 


he 


^ A 


Noting that the first term on the right hand side is just the mean {h/2) of the Binomial 
distribution scaled hy 4c gives the result. □ 


The Lower Bound. We now move on to a lower hound. We first give a lower bound 
for the case h=l, when gh = E(max(-)), and show how to extend this for general h. To 
prove the h = 1 case, we will use our transformation in Lemma 12.61 to zero all values 
lower than the top l/2fc quantile, and prove a lower bound on random variables with 
total positive probability mass < 1/2A:. We thus first state and derive this. 

Lemma 2.13. Let Xi,...,Xk all have total positive probability mass — 2k’ with 
f\{Xi) > c for all i. Then 


E (max(Xi,..., Xk)) > 2c 



Proof. For any Xi, let At be the event that Xi is nonzero. We lower bound the 
expected maximum with an approximate max-finding algorithm: 


If Xi is nonzero, the algorithm outputs Xi 

Else if Xi is zero but X 2 is nonzero, the algorithm outputs X 2 , and so on. 
If Xi,Xk-i are all zero, then the algorithm outputs Xk 


The output value of this algorithm is thus pointwise less than or equal to the true 
maximum, so its expected value is a lower bound on the expected maximum. But its 
expected value is just 


P(Ai)E(Xi|Ai) + (1 - P(Ai))P(A2)E(X2|A2) + ... + 


rk-1 




E(Xfc) 


Noting that P(Ai)E(Xi|Ai) = E(Xi) and that (1 — P(^i)) > (1 — ^), we get 

E(max(Xi,...,Xfe)) >E(Xi)+ (^1-A^E(X 2 ) + ...+ (^1- E{Xk) 
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Using the lower bound of ]E(Xi) > from Lemma [2.81 summing up the geometric 

series, and noting (1 — > (1 — ^),we have 



as desired. □ 

We now prove our lower bound for h = 1. 

Theorem 2.14. Let Xi ,..., be random variables with fi{Xi) > cfor all i. Then 



Proof. For any Xj w ith total positive probability mass > ^, we apply the trans¬ 
formation in Lemma |2]6] to get X[, which is a lower bound on Xi. So certainly 


E (max(Xi,..., Xk)) > E(max(X(,..., X'f.)) 


and by Lemma l2^ 



so using Lemma r2.13l . the statement of the theorem follows. □ 

We now apply this to prove the main lower bound theorem 

Theorem 2.15. Let Xi ,..., X^ be random variables with fh{Xi) > cfor all i. Then 



Proof. Note that certainly 


gh{Xi ,..., Wfe) > E(max(Ni, ...,Xk/h)) + ... + E(max(Wfe_/,+i,..., Xfc)) 


But each term on the right hand side is bounded below by 2c ^1 — by using Theo¬ 


rem |2U4l So summing together, we have 



as desired. □ 

Finishing the proof. With established lower and upper bounds. Theorem 12 . 11 1 fol- 
lows easily. 

Proof. (Theorem \2.11\ First note that if / < ii, we can define gh{Xi, ...,Xi) to be 
the sum of the expectations of all the Xi as this is the same as adding h — I random 
variables, each deterministically 0. 

Without loss of generality, let {Yi, ..., Yfc} = {Yi, ...,Yi,Xi+i, ...,Xk} i.e. Xi+i,...,Xk is 
the intersection of the team formed of best test scorers and the optimal team. Now, if 
c = mini fh{Xi), then for j < I, as any Yj is not in the top k scorers, fh(Yj) < c. 


Note that 


‘2gh{Xi, ..., Xk) > ghiXi, ..., Xk) + gh{Xi+i, ..., Xk) 
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Using the lower bound from Theorem l2.15[ we get 


2g^(Xi, Xk) > 2hc (l - + 9hiXi+u..., X^) 

On the other hand, 

gh{Yi, ...,Xi+i, ...,Xk) < ghiXi, + gh{Xi+i, ...,Xk) 

Using the upper bound from Theorem l2.12l then gives 

he 

gh (hi,^;+i, Xk) < 2hc + -j—h gfi{Xi+i,Xk) 


So we get that 

where 


gh{Y^,:;Yk) < Xgh{Xu...,Xk) 


A = 




□ 


So fh{X) provides a very good measure of an individual’s contribution to a team 
performance. It is not however, the only such test score. Ifi5 = {a;;a;>l — for 
uj G [0,1], the underlying sample space, then choosing X according to the value of 

E(X|£;) 

also provides a constant-factor approximation to the optimal set. Note that this func¬ 
tion also shows the importance of potential versus average individual performance - 
as h gets larger and larger, our score gets closer and closer to E(N).( See Section 3 for 
a more precise statement regarding E(N) as a test score.) 

We state the result here; for a full proof, see the Appendix. 

Theorem 2.16. If Xi,Xk are random variables with the k highest values of 
E{Xi\Ei), where Ei is the event that Xi takes its top h/k quantile of values, and Yl,..., Yk 
is the optimal set size k, then for a constant g independent of k, 

gh{Yi,...,Yk) < ggh{Xi, ...,Xk) 


The two proofs are similar, which is expected, as the analysis of the function fhf) 
makes use of quantities derived from E(A|U). The function fhf) seems the more natu¬ 
ral of the two, however: it is arguably more direct to think about testing an individual 
through repeated independent evaluations than to try quantifying what their top h/k 
values are likely to be. 

3. SUBMODULARITY AND NEGATIVE EXAMPLES 

Earlier, we claimed that E(max(-)) is submodular. In fact, a stronger statement is true. 
To state it, we recall our notation in which, for a set T of random variables, X^'^ denotes 
the largest in the set. 

Theorem 3.1. Let U be a large finite ground set of nonnegative random variables, 
with ft being the underlying sample space. In a slight abuse of notation, for w G O, and 
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h>0, let 


ujh : 'P{U) —>■ K 

be defined by 

LUh{T) = {X^T^+ + 

i.e. the sum of the top h values of the random variables in T evaluated at the sample 
point oj. Then ujh{-) is submodular. 

The proof of this Theorem is in the Appendix. We then have 


Corollary 3.2. For h>l, gh{-) is submodular. 
hy taking expectations. 

There are many results about the tractahility (or approximate tractahility) of opti¬ 
mization problems associated with submodular functions. For our purposes here, the 
most useful among these results is the approximate maximization of arbitrary mono¬ 
tone submodular functions over sets of size k. This can be achieved by a simple greedy 
algorithm, which starts with the empty set, and at each stage, iteratively adds the el¬ 
ement providing the greatest marginal gain; the result is a provable (1 — 1/e) approx¬ 
imation to the true optimum iNemhauser and Wolsevlll978l] . Note that this means we 
can find a good approximation of the optimal set even when the random variables Xi 
are dependent. (See Section 4 for further discussion of this.) 

The Canonical Test. Submodularity also leads to a precise characterization on 
the performance of E(A) as a test. It follows easily from this observation, which is 
a straightforward application of the defining property of submodular functions. 


Observation 3.3. If f is a submodular function on 'P{U), then for every S C U 


/(5)<^/(W) 

x^S 


This naturally leads to: 

Proposition 3.4. If g^f) is the team evaluation metric, with Yi,...,Yk being the 
true optimal set, and Xi,...,Xk the random variables with the k highest expectations 
(with E(Ai) > E(Aj) ifi> j) then 

gh{Yu....,Yk)<^gh{X^,:;Xk) 

and this bound is tight. 


Proof. By the observation, we note that 

k k 

But SiS Xi, ...,Xk are the elements with the k highest expectations, 

k k 7 ^ 

^E(r,)<^E(A,)< -^E(A0 

i—1 i—1 i—1 


the last inequality following from the assumption on the ordering of the Xi. Finally, 

h 

gh{X,,...,Xt)> gh{X,,...,X,,) =J2^{X^) 
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the last equality as there are only h values. Putting it together, we have 

Pfc) < -r9h{Xi, .■.,Xh) < —gh{Xi,...,Xk) 
h fi 

as desired. For tightness, let Xi he deterministically 1 + e and Yi he n with prohahility 
1/n for large n. Then 


(t) (i-t 


k—i 


> n 1 - 1 - 


= k + 0 


Also, 

gh{Xi,...,Xk) = h{l + t) 
So as n —oo and e —0, we have 

k 


gh{Y,,...,Yk)^-ghiXi,...,Xk) 


□ 


3.1. Test Scores for Other Submodular Functions? 

We have seen how a non-trivial test score can produce an approximately optimal team 
for the particular suhmodular function corresponding to the top h values of a set of 
random variables, with an approximation guarantee independent of k. It is natural to 
ask whether test scores can he used in a similar way for all suhmodular functions. 
Here is one way to formalize this question. 

Question 3.5. Given a (potentially infinite) universe U, an associated submodular 
function g, and a number k, does there exist a test score f 

/ : t/ ^ R+ 

such that for any subset S C U, if xi, ...,Xk S S are the elements with the k highest 
values of f, then gfxi ,..., Xk) is always a constant-factor approximation to 

max q(T) 

TcS,\T\=k 

In Section 2, we obtained a positive answer when U was the set of all applicants 
(discrete random variables), g one of the measures based on the expected maximum, 
and S a finite subset of the candidates. 

The answer to this question for arbitrary submodular functions, however, is neg¬ 
ative. Many submodular functions depend too heavily on the interrelations between 
elements for independent evaluations of elements to work well. We present two such 
examples. 

Cardinality Function. One of the canonical examples of a submodular function is 
the set cardinality function. Let U = P(N). Then for T = {Ti,..., Tj„}, with T* G U, 

g{T) = I U-i n 

This function has a natural interpretation for team performance. We can imagine each 
candidate as a set Ti, consisting of the set of perspectives they bring to the task. 
g(Ti,T 2 ,... ,Tm) is then the total number of distinct perspectives that the team mem¬ 
bers bring collectively : this objective function is used in argument s that diverse teams 
can be more effective IlHong and Pageir2004l:lMarcolino et ai.ll201^ . 

We show a negative result for the use of test scores with this function. 
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Theorem 3.6. In the above setting, with universe U, and g the set cardinality func¬ 
tion, no such test score f exists. 

Proof. Suppose for contradiction such an / did exist. Assume ties are broken in 
the worst way possible (no information is gained from a tie.) Let (7i, U 2 , ■■■ be disjoint 
intervals in N with 

Ui = {(* ~ 1)(^ + 1) + Ij + 1)} 

And let 

14 = {5 C [/. : 1^1 = k} 

i.e. the set of all size k subsets of Ui. We will find it useful to label elements of Vi based 
on their / value, so let 

k) — { Aji, ..., Hik+1 } 

with 

/(Aa) < f{Xa)... < f{X,k+i) 

Call a set Vj, j > k bad with respect to Vj if 

f{Xji) < /(A 12 ) 

and good otherwise. Note that we cannot have more than k Vj bad with respect to Vi. 
Else, supposing V^,..., 14^ were all bad with respect to Vj, in the set 

S = {X 12 , ..., Aifc+i, ..., Xnf.l} 

the k set chosen by / would be A 12 ,..., Xik+i, for a g value of fc + 1, but the optimum is 
given by X^i ,..., Xn,,i, for a g value of - a factor of Ri fc difference. 

So there are at most k bad sets with respect to Vj. But the same logic applies to 
Vj, 14 . So in 14 + 1 ,..., Vk 2 j^f.j^i there is at least one set, say Vj , that is good with respect 
to Vj,..., 14 . But then in the set 

S = {Nil, Nfei, Nji,..., Xjk} 

the k set chosen by / would be N^i,..., Xjk, with a g value of fc + 1, but the optimum 
would be All,..., A^i with a g value of □ 

Linear Matroid Rank Functions. Another class of measures of team performance 
is given by assigning each candidate a vector Vi € M™, and the performance of a team 
vi,V 2 ,... ,Vkis, the rank of the span of the set of corresponding vectors. Such a measure 
has a similar motivation to the previous set cardinality example: if the team is trying 
to solve a classification problem over a multi-dimensional feature space, then Vi may 
represent the weighted combination of features that candidate i brings to the problem, 
and the span ol vi,V 2 ,... ,Vk establishes the effective number of distinct dimensions 
the team will be able to use. 

More generally, the rank of the span of a set of vectors is a matroid rank function, 
and we can ask the question in that context. Given a matroid (V,!) and a set S c V, 
the matroid rank function g is 

g{S) = max{|T| :TcS,TgX} 

i.e. the maximal independ ent set contain ed in S. It is well known that matroid rank 
functions are submodular llBirkhofflll933[1 . To come back to our vector space example, 
we show that when our underlying set is R™, and I are subsets that are linearly inde¬ 
pendent, no single element test can capture the relation between vectors well. 

Theorem 3.7. For U, g as above, no test score with good approximation exists. 
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The proof of this theorem relies on the fundamental property of R. We show that 
for any sequence along a specific direction, the / values for this sequence must he 
bounded. By the defining property of R, each sequence then has a convergent sub¬ 
sequence. Looking at these convergent subsequences along each of k coordinate axes 
ei,Cfe, we can then pick our bad set fooling / into choosing 0{k) points in the same 
direction. See the Appendix for a full proof 

A Supermodular Function. The above two examples show bad cases for submod- 
ular functions. As is expected, supermodular functions also have a negative answer to 
Question 3.5. 

A classic example of a supermodular function is the edge count function. 

Definition 3.8. Given a graph G = {V,E), and a set S C V, g{S) is the number of 
edges in the induced subgraph with vertex set S. 

It is easy to check that g is supermodular. g also forms our bad example for super- 
modular functions. 

Theorem 3.9. Let U be a very large graph, containing at least N disjoint complete 
graphs with k + 1 vertices - i.e. Kk+i. Then there is no test score f with a constant 
(independent ofk) order approximation property to the optimal k set with respect to g 

The proof is very similar to the cardinality function case. In that, we wanted to 
avoid picking subsets of the same set; in this, we would like to pick as many vertices 
in a single clique as possible. We adjust the notion of had accordingly to ensure this 
doesn’t happen, and arrive at our desired contradiction identically to before. 

A particularly interesting feature of this case, is that, without the canonical statisti¬ 
cal test for submodular functions, we can have an arbitrarily bad approximation ratio 
- even if / is defined to be constant on each vertex, the counterexample demonstrates 
that / may pick a set with no induced edges. 

4. HILL CLIMBING AND OPTIMALITY 

For most non-trivial submodular functions, finding the optimal solution is computa¬ 
tionally intractable. This is the case for the maximum of a set of random variables 
that are not necessarily independent. In particular, suppose that S = {Ai, A 2 , ..., X^} 
is a set of dependent random variables. For a set T of them, we can define g{T) to be 
the expected maximum of the random variables in T. We now argue that maximizing 
g{T) is an NP-hard problem in general. We will do this by reducing an instance of Set 
Cover to the problem. 

Recall that in set cover, we have a universe U, and a set T = {S'!,.., S'„} of subsets of 
U i.e. St C U for all i. We wish to know if there is a subset T' c T, with \T'\ < k, such 
that UsiGT' 'I'o model this with random variables, let the underlying sample 

space be U, and each Xi = I5. the indicator function for the set Si. Then it is easy to 
see that there exists a team size k with expected maximum 1 if and only if there exists 
T' as above, \T'\ < k. So maximizing the expected maximum of a set size k provides an 
answer to the NP complete decision problem. 

In te rms of approximation, we can apply the general hill-climbing result mentioned 
earlier iNemhauser and Wolsevl[l978l] to provide a (1 — 1/e) approximation for finding 
the set of k dependent random variables with the largest expected maximum. 

A natural question is whether independence is a strong enough assumption to guar¬ 
antee a better approximation ratio. Indeed, we may even be tempted to ask 

Question 4.1. IfXi, ..., A„ are (discrete) independent random variables, does hill¬ 
climbing find the size k set maximizing the expected maximum? 
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Unfortunately, this is false. For a simple counterexample, take X taking positive 
values (9/5,6/5) with respective prohahility masses (1/3,1/3), V deterministically 1 + e 
for e very small, and Z taking a positive value 3/2 with probability 2/3. Then E(y) > 
E(X), E(Z) which means in the first step, hill-climbing would choose Y. But, 

E(max(X, Z)) > E(max(y, Z)), E(max(X, U)) 

so hill-climbing would not find the optimal solution. In this counterexample, F, Z are 
both examples of weighted Bernoulli random variables. 

Definition 4.2. We say a random variable X has the weighted Bernoulli distribu¬ 
tion, ii X = X for some x > 0 with probability p, and W = 0 otherwise. 

What is surprising is that when all our random variables are weighted Bernoulli, 
Question |4)T] has an affirmative answer. 

Theorem 4.3. Given a pool of random variables, each of weighted Bernoulli dis¬ 
tribution, performing hill-climbing with respect to E(max(-)) finds the size k set maxi¬ 
mizing the expected maximum. 

In the context of forming teams, we can think of candidates with weighted Bernoulli 
distributions as having a sharply “on-off” success pattern — they have a single way to 
succeed, producing a given utility, and otherwise they provide zero utility. 

For X as above, we will find it convenient to denote X as (p,x). For two weighted 
Bernoulli random variables X = {p,x) and Y = (g,y), we use X > Y to mean x > y. 
For Xi = {pi,Xi), with Xi > .. > Xk, the expected maximum has an especially clean 
form: 

fe-i 

E(max(Wi, ...,Xk)) = piXi -I- (1 - pi)p2X2 + 

i=l 

Rewriting this slightly, it also has an intrinsically recursive structure 
E (max(Xi,..., Xk)) = piXi -f (1 - pi)E(max(X 2 ,..., Xk)) 

As a step towards proving Theorem 14.31 we need two useful lemmas on when ran¬ 
dom variables can be exchanged without negatively affecting the expected maximum. 
Assume from now on all random variables are weighted Bernoulli. 

We state the lemmas below, with full proofs in the Appendix. 

Our first lemma shows that if one random variable dominates another in both 
nonzero value and expectation, we may always substitute in the dominating variable. 
So given two random variables with the same expected value, we always prefer the 
’riskier’ random variable. 

Lemma 4.4. If X>Y, and E(A) > E(F), then for any Xi, ...,Xk, 

E(max(A, Ai,..., Xk)) > E(max(F, Ai,..., Afc)) 

The next lemma describes a slightly technical variant of the above substitution rule: 

Lemma 4.5. Let X > Y, and E(max(A, Ai,..., Afc)) > E(max(F, Ai,..., Afc)). Then if 
Yi,..., Ym such that Y >Yi for all i, 

E(max(A, Ai,..., Afc, Fi,..., F^)) > E(max(F, Ai,..., Afc, Fi,..., F^)) 

We can now easily prove Theorem l4.3l 

Proof. (Theorem \4.3\ We prove this inductively, showing that the element chosen 
by hill-climbing at time i is part of the optimal set from then on. Our base case is 
proving the first element chosen, A = (x,p), which has greatest expectation, is always 
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in the optimal set. Suppose the optimal set size fc is {Fi,Ffc}- Then if some Yi < X, 
hy Lemma [4^ we could replace Yi hy X. So X < Yfc. But as Yu only appears as E(yfe) 
in E(max(Fi, Yu), and X has greatest expectation, we can replace Yu hy X. 

Suppose we have chosen t random variables, Xi > ... > Xt, with the random 
variable chosen being X^. By the induction hypothesis, we know Xj for j ^ i are part 
of any > t sized optimal set. For an optimal solution size fc, let Fi > ... > Y^ (where 
m may equal 0) be the random variables distinct from Xi, inbetween Xi^i and Xi+i 
value-wise. Similarly, let > ... > Zu be the random variables inbetween and 
Xu- We have a few cases. 

First note if m > 0, and Xi > Yj some j, then as E(max(Xi,..., W^)) > 
E(max(Y,, Xj+i, ...,Xt)), by applying Lemma [431 we can swa p Yj with Xi. So W < Y^ 
for all j, or m = 0. In either case, if > 0, applying Lemma [4.5! again, we may swap 
Xi with Zi. So h = 0, and so in order value, the final string of random variables in the 
optimal set is just X^, Xi+i, ..., Xu. Note that if we take the smallest random variable 
distinct from the Xi larger than Xi, say Y, Xj >Y > Xj+i, then as 

E(max(Xi, ...,Xt)) > E(max(F,Xi, ..., Xi^i, Xi+i, ...Xt)) 

from the choice of elements by the hill-climbing algorithm, by the recursive structure 
of the expected maximum, we must have 

E{max{Xj,Xj+i, ...,X„ ...,Xt)) > E{max{Xj,Y,..., X,_i, X^+i, ...,Xt)) 
so we can swap Y with X^. This completes the induction step, and the proof □ 

This proof method gives us a simple condition which is sufficient (though slightly 
stronger than necessary) for when the hill climbing algorithm finds the optimal set: 

Condition 4.6. Let f be a submodular function on a universe U. If St = {xi,...,xt} 
is the set picked by hill climbing at time t, (with S = %) at t = Q, and xt+i is the next 
element chosen by hill climbing, then for any Z c U \ St, must have 

max f {St U {xt+i} U F \ {z}) > / (S'* U Z) 

z£Z 

For submodular functions satisfying Condition l4.6[ it is possible to prove the optimality 
of hill-climbing as above. Given that St is part of the optimal set, we show that we can 
always substitute in xt+i into the optimal solution and ensure the value of / doesn’t 
decrease. Hence, xt+i must be part of the optimal set. 

5. CONCLUSION AND OPEN PROBLEMS 

In this paper, we have demonstrated that for a natural family of submodular perfor¬ 
mance metrics, team selection can happen solely on an individual basis, with minimal 
concession in team quality. However, this selection criterion is more intricate than the 
canonical test (singleton set value), the performance of which we also characterized. 
Not all submodular functions are amenable to such an approximation, and we exhib¬ 
ited examples where no function could always guarantee a constant order bound. This 
leads to the natural question of whether it is possible to characterize the truly sub¬ 
modular functions (functions for which, like the expected maximum, the canonical test 
performs poorly) which can approximated in such a fashion. There may be an opportu¬ 
nity to connect such questions to a distinct literatu re on approximating a submodular 
function with only a small number of values kno wn MGoemans et al.ll2009t] . and approx¬ 
imation by juntas IlFeldman and Vondrakll2013n . 

Finally, we also explored the implications of independence of random variables when 
using hill-climbing to approximate the size-fc set maximizing the expected maximum. 
We established that for certain random variables, we could find the true optimum this 
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way. A natural question is then, for what distributional assumptions can we guaran¬ 
tee optimality, or a significantly better approximation ratio? Much work has been done 
on structural properties of ensembles of random va riables with different distributions 
IlDaskalakis et al.ll2012an . BDaskalakis et al.ll2012bl] . and it is possible that such tech¬ 
niques may be useful here. 
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6. APPENDIX 

Here we provide a proof of Theorem 12.161 Note that large parts of the proof are very 
similar to Theorem l2.11[ so we will omit some of the details. 
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Pr oof. Note that if we find upper and lower bounds like Theo rem I2.12l and The¬ 
orem I2.15I then we can use the final part of the proof of Theorem I2.11I unchanged to 
give our desired result. 

First, note that if E(X|£^) ^ c, tliGH 3.iiy v3.1u.© of X not in its top hik cjimntilo must 
be < c (conditioning on E ensures the the expectation of X is a linear combination of 
the top values of X.) Now, if Xi, ...,Xk such that < c for all i, then letting 

T c [k] and 

Ct = {w e [0, l]'^ : w, > 1 - J ieT} 

n, 

be defined analogously to before, we get 

5^((Xi,...,Nfc)lc^) <P(CT)|r|c + P(Cr)hc 
as before. Summing up we note 



so we have a Binomial distribution parameters (fc, h/k), similar to before, so 

5 ,,(Ni,...,Nfe) < (l-^) +hc = 2hc 

This gives us an upper bound. The lower bound is of a similar flavor to the upper 
bound. Suppose Xi, ...Xk such that ¥.{Xi\Ei) > c for all i, and T and Ct are as above. 
Then note that 


ghiiXi,...,Xk)lcT) > P(C't) • min(|r|, h)c 

i.e. for an event w € Ct, gh{Xi, ..., Xk) is greater than summing the minimum of h and 
|r| of the random variables that take values in their top h/k quantile. Noting we have 
the same Binomial distribution as before 



where the last inequality follows by noting that as the mean of this distribution is h, 
the median certainly contained in the range h/2 < i < k. 

Note that to be entirely precise, we should replace /i/2 with [|J. The h = 1 case then 
needs to be dealt with separately. For h = I, note that the probability at least one of 
the Xi takes a value in its top h/k = 1/k quantile is 



So for the h = I case we can bound below by 



We finish using the same proof as in Theorem l2.11[ getting fj, = 16. □ 


6.1. Submodularity and Negative Examples: Proofs 

We first give a proof of Theorem l3.ll 
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In summary, we prove that if A = S\{y}, with S CU, then for x ^ S, the suhmodular 
property 

uJh{S U {x}) - LOhiS) < LUh{A U {a;}) - LOh{A) 

holds. We show this hy fixing an order of elements in S under w and considering what 
each side of the inequality looks like. Chaining a set of inequalities of this form hy 
removing one element each time gives the result for arbitrary subsets of S. 

(Note that if |A| < h, only the first |^| terms are possibly nonzero - we can increase 
|A| by adding a number of deterministically zero random variables.) 

Proof. (Theorem \3.1) 

Assume S = {Xi,Xn}, and A = {Ai,..., A„_i}. Rearranging, the submodularity 
inequality becomes 

u}(A U {A, A„}) + ui{A) < w(A U {A}) + w(A U {A„}) 

First note that A, A„ are interchangeable in the above inequality. We examine two 
cases. 

(1) At least one of A, A„, wlog A (by symmetry) is not in the top h values in w. This has 
two easy subcases. If | A| > h, then 

^h{A U {A, A„}) + ujh{A) = ujh{AVJ {Xn}) + ujh{A) 

and 

w?i(A U {A}) + u!h{A U {Xn}) = 0Jh{A) + uJh{A U {A„}) 
so equality holds. In the other case, we have |A| < h, so we get 

i-^hiA U {A}) + LUh{A U {Xn}) = uJh{A) + X{u}) + uJh{A) + A(a;) 

The left hand side of the target inequality becomes 

^h{A U {A, Xn}) + ujh{A) < (A + A + Xn)(uj) + A{u}) 

with strict inequality if \A\ =h — 1, as A would be omitted in this case. So again, the 
desired inequality holds. 

(2) Now, we may assume that A„, A are both in the top h. Assume 

Xn{u}) = ^AU{X,X„} 

and 

A(w) = X^Ali{X,X„} 

and wlog i > j. In A U {A, A„}, let the top h + 2 elements (with appropriately many 
zero elements) be ordered as below: 

Xmiuj) > Anjw) > ...A„,_^(w) > A„(w) > Xm+iiuj) > ...Xn^_^{uj) > A(w) > > ...Xn^^^^i^) 

Then we get 


<-^h(A U {A, Xn}) + U}h{A) — 

/ 

2 

fH-2 \ 

+ X + Xn + Xn^+i + Xnh +2 


V 


j 


and 


uJh(A U {A}) + uih{A U {Xn}) — 

( 

2 

^ h-2 ^ 

E^- 

+ A + A„ + 2A„,^, 


1 


J 
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Noting that gives the result. 


□ 

Below is the full proof of Theorem l3.7l 

Proof. (Theorem \3. 7P Like before, we assume for contradiction that such an / does 
exist. We need a Lemma. 

Lemma 6.1. Let x € K™. Then the set 

{/(Ax) : A e R} 

is bounded. 

Proof. Suppose not, then there is a sequence (A„)„gN such that 

/(A„x) > n 

But letting ei,...,efe he the standard basis vectors, and c = max/(ei), there are 
A„i,...,A„j, with 

/(A„,x) > c 

so in the set {ei,..., e^, Xn^x ,..., A„;.x}, the optimal set has rank k but the highest scoring 
k set has rank 1. □ 

The consequence (from the fundamental property of the real numbers) is that any 
sequence of vectors along a particular direction have a convergent subsequence. In 
particular, defining 



we see that for each i, (am) has a convergent subsequence. Relabelling if necessary, let 
this convergent subsequence be (am), with 

ain ^ ^2 

for each i. Wlog, we assume that bi > &2 .. > bk. We now complete the theorem by 
examining a few cases. 

Case 1: bi > 6^/2 In this case, we can take terms very close to &i and terms very close to bi for 
i > fc/2 to ensure we pick all the aim terms which only have rank 1. 

In more detail, let S < 6i — &fc/2- Then as we have a finite number of convergent 
sequences, 3N such that for all m > N, \aim — bi\ < (5/3 for all i. So for l,m > N, and 
for all i > /c/2 we have 

aim ^ an 

In particular, in the set 

{aim, ■■■, Oi(m+fc)j a(fc/2)ii Ofei} 

the k set with the maximum / values are the first k, for a rank of 1, but the optimal 
set can achieve rank A:/2 + 1 (taking say the last /c/2 + 1 elements), providing the 
desired contradiction. 

Case 2bi = ... = 6^/2 = b Here we derive a contradiction by looking more closely at what 
each sequence for i < /c/2 can do and deriving a contradiction. Assume from now 
on that i < /c/2. 

(i) If for some i, say i = 1, there was ni, ...rifc and (5 > 0 such that ai„^ > b + 5, then 
for j ^ 1, picking aji- within 5/2 of b would mean {ai„^ : r < k}U [aji- : j < /c/2} 
would form a bad set for /, with a 2//c approximation ratio. 
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(ii) So certainly only finitely many terms > b for any i. Discarding them, assume the 

sequences < b for all i,j. If for some i, say i = 1, k or more terms were equal 
to b, say ai„^, aim, then for any j (noting we break ties as in the worst case), / 
performs poorly (2/fc approximation) on the set {aim, ■■■, “intl U { 021 , a(fe/ 2 )i}- 

(iii) So for each i, only finitely many terms = b. Discarding those, assume all Oy < b. 

Let c = min*Oil- Then picking ni,...,nk so > c, f has the same poor 2/k 
approximation on {an^a^k/ 2 )i, aim , *■*5 }• 

This completes the proof of the Theorem. □ 

We now give the full proof for the had example for supermodular functions. 

Proof. (Theorem \3.9\ 

Assume such an / does exist. Let AT^ ..., he the set of size-(fc+l) complete graphs. 
Let the vertices of he [vji, •■•,^^(^.+ 1 )} in increasing order of /-value, Consider K^. 
For j > k, say is bad with respect to if > f{vik)- If AT"/ ..., AT"'” are all 

had with respect to K^, then in the set {rn,..., vik,Vm{k+i), ■■■, Vn^ik+i)}, the set chosen 
hy the test score would he Vm{k+i), ■■■, Vnk{k+i), for no induced edges, while the optimal 
set is wll,..., vik with k{k — l)/2 induced edges. 

So there are less than k graphs had with respect to . Similarly to before, applying 
the same argument to AT^,..., AT^, we note that in ..., +'=+ 1 ^ there is at least 

one graph that is not bad with respect to all of AT^, say AT'". But then taking the 

set Vk(^k+i), 'Omi ; 'Onik{, the test score pick vi(^k+i), rfc(fc+i) again with no 

induced edges, while the optimal set is Vmi, ■■■, v-mk with k{k — l)/2 edges. □ 

6.2. Hill-Climbing and Optimality 

We give proofs of the two lemmas to show optimality in the weighted Bernoulli case. 

Proof. (Lemma \4.4\ Assume Xi are in value order. Wlog assume X > Xi for all i 
(an almost identical proof works if that is not the case) and that Xt >Y > Xt+i. Let¬ 
ting X = {p,x), Xi = (pi,Xi). Also, assume that Y = {q,y). By the recursive structure 
of the expected maximum for weighted Bernoulli random variables, 

E(max(A, Xi ,..., Xk)) = px + (1 - p)b + (1 - p)sc 

and that 

E(max(y, Ai,..., A^)) = b + sqy + s(l - q)c 

where 

b = E(max(Ai,..., At)) 
s = P(Ai,...,At = 0) 
c = E(max(At+i,..., A^)) 


Note 6 -t sc < X as A > Ai,..., Xk- So, i{p> q, 

px -I- (1 — p){b + sc) > qx + {1 — q){b + sc) 

The left hand side of the above is just E(max(A, Ai,..., A^)), so we can assume 
P < q by decreasing p to g if necessary, and this will only decrease the value of 
E(max(A, Ai,..., Afc)). Now, note that 

E(max(A, Ai,..., Afc)) > E(max(F, Ai,..., Afe)) px — pb — sqy + {q — p)sc> 0 

But 6/(1 — s) is a convex combination of Ai,..., At, so 6/(1 — s) < x. So, 
px — pb — sqy + {q — p)sc > spx — sqy + {q — p)sc 
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Finally, by assumption, E(X) > E(y), and p < q, so the result holds. □ 

Proof. (Lemma \4.5\ We prove this by contradiction. Again, we may assume that 
X > Xi for all i, Xi are in value order, and Xt > Y > A^+i as before. Using the 
notation of Lemma 14.41 first note that p < q, as otherwise, E(A) > E(y), and we could 
directly apply Lemma Our assumption gives the following inequality: 

px + (1 — p)b + (1 — p)sc > b + sqy + (1 — q)sc 

Suppose the Lemma is false. Then, we have 

px + (1 — p)b + (1 — p)sd < b + sqy + (1 — q)sd 

where 


d = E(max(At+i,..., Xk, Yi, ..., U„)) 

We show that both of these inequalities cannot hold simultaneously. 

Xs p < q, we have that 

E(max(A, Xi, ..., Lm)) — E(max(A, Xi,Xk)) = (1 — p)s{d — c) > (1 — q)s{d — c) 

But 

E(max(y, Ai, ...,Fm)) - E(max(F, Ai,..., Afc)) = (1 - q)s{d-c) 

Writing 

E(max(A, Ai, ...,Ym)) = (E(max(A, Ai, ...,Fm)) - E(max(A, Ai,..., Afe)))+E(max(A, Ai,..., A^)) 

and E(max(y, Ai,..., Pm)) analogously and comparing contradicts the falsity of the 
Lemma. □ 
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